This is where the final project report write-up goes.
Before you submit, make sure everything runs as expected.
You can add sections as you see fit. Make sure you have a section called “Introduction” at the beginning and a section called “Conclusion” at the end. The rest is up to you!
##Introduction - Load the tidyverse, ggplot, and rtweet packages
rr library(tidyverse) library(ggplot2) library(rtweet) library(readr)
This data set was scraped from WineEnthusiast, a website that reviews and rates many differet types of wines.
wines <- read.csv(file = '../data/processed_data/wines.csv')
cannot open file '../data/processed_data/wines.csv': No such file or directoryError in file(file, "rt") : cannot open the connection
rr set.seed(19630217) wine_sample<- sample_n(wines, 1000)
EDA (correlation priceXpoints, with DataExplorer library? using (this)[https://datascienceplus.com/blazing-fast-eda-in-r-with-dataexplorer/])
rr wines %>% ggplot() + geom_point(mapping = (aes(x = points, y = price)), na.rm = T)
rr wines %>% summarize(mean(price, na.rm=TRUE), min(price, na.rm=TRUE), max(price,na.rm=TRUE), sd(price, na.rm=TRUE))
rr wines %>% summarize(mean(points, na.rm=TRUE), min(points, na.rm=TRUE), max(points,na.rm=TRUE), sd(points, na.rm=TRUE))
Select the provinces based on points and Select the best province for wine based on the average points of the sample size.
#find the average number of points across the 1,000 samples
rr wine_per_province <- wine_sample %>% select(province, points) %>% summarise(points = mean(points)) wine_per_province
#Find the best province for wine using the average points across the 1,000 samples #drop the descriptions or just select price? set points to max(points)
rr best_province <- wine_sample %>% group_by(province, points) %>% filter(points > 88.669) best_province